# Vision-Language Joint Modeling
Vica2 Stage2 Onevision Ft
Apache-2.0
ViCA2 is a 7B-parameter multimodal vision-language model focused on video understanding and visual-spatial cognition tasks.
Video-to-Text
Transformers English

V
nkkbr
63
0
Ret CLIP ViT L 14
Apache-2.0
ReT is an innovative method supporting multimodal query and document retrieval, achieving fine-grained retrieval by fusing multi-level representations from vision and text backbone networks.
Multimodal Fusion
Transformers

R
aimagelab
523
0
Image Captioning Model
Apache-2.0
A model combining Vision Transformer (ViT) with natural language processing to automatically generate natural language descriptions for input images
Image-to-Text
I
premanthcharan
28
1
Paligemma Multimodal Query Rewrite
A multimodal query rewrite model fine-tuned based on google/paligemma-3b-pt-224
Image-to-Text
Transformers

P
utischoolnlp
31
1
Llava V1.6 Vicuna 7b
LLaVA is an open-source multimodal chatbot, fine-tuned on large language models using multimodal instruction-following data.
Text-to-Image
Transformers

L
liuhaotian
31.65k
123
Llava Int4
CC
LLaVA is a multimodal large model that achieves general-purpose visual assistant capabilities by connecting a visual encoder with a large language model
Text-to-Image
Transformers

L
emon-j
40
2
Filmtitle Beit GPT2
Apache-2.0
A Chinese movie poster title generation model based on BEiT visual encoder and GPT2 text decoder
Image-to-Text
Transformers Chinese

F
snzhang
22
2
Matcha Chartqa
Apache-2.0
MatCha is a pre-trained model that enhances the ability of vision-language models to process chart and language data, excelling in chart question answering tasks
Text-to-Image
Transformers Supports Multiple Languages

M
google
1,060
41
Matcha Chart2text Pew
Apache-2.0
MatCha is a vision-language model based on the Pix2Struct architecture, specifically optimized for chart comprehension and numerical reasoning tasks, excelling in chart-based question answering.
Image-to-Text
Transformers Supports Multiple Languages

M
google
168
39
Featured Recommended AI Models